Log-linear Models for Uyghur Segmentation in Spoken Language Translation

نویسندگان

Chenggang Mi

Yating Yang

Rui Dong

Xi Zhou

Lei Wang

Xiao Li

Tonghai Jiang

چکیده

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule Based Analysis of the Uyghur Nouns

This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...

متن کامل

Uyghur Language Model with Graphic Structure

This paper describes a novel agglutinative language modeling strategy for Uyghur with graphic language model as structure. In graphic modeling language model, sentences are organized by morphemes as a directed graph, which is different from the linear structure in n-gram language models. The graphic language model is verified in two typical natural language processing application scenarios, mor...

متن کامل

Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT

Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The ad...

متن کامل

Discriminative Learning of Feature Functions of Generative Type in Speech Translation

 The speech translation (ST) problem can be formulated as a log-linear model with multiple features that capture different levels of dependency between the input voice observation and the output translations. However, while the log-linear model itself is of discriminative nature, many of the feature functions are derived from generative models, which are usually estimated by conventional maxim...

متن کامل

A WFST-based log-linear framework for speaking-style transformation

●Objective: Transform spoken-style language (V) into written style language (W) for the creation of transcripts ●Approach: Statistical machine translation to “translate” from verbatim text to written text ●Innovations: ●Log-linear modeling for improved accuracy ●Introduction of features to handle common phenomena in speaking-style transformation ●WFST-based implementation for integration with W...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Log-linear Models for Uyghur Segmentation in Spoken Language Translation

نویسندگان

چکیده

منابع مشابه

Rule Based Analysis of the Uyghur Nouns

Uyghur Language Model with Graphic Structure

Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT

Discriminative Learning of Feature Functions of Generative Type in Speech Translation

A WFST-based log-linear framework for speaking-style transformation

عنوان ژورنال:

اشتراک گذاری